Picture for Ziyang Ma

Ziyang Ma

CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems

Add code
May 28, 2026
Viaarxiv icon

Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems

Add code
May 25, 2026
Viaarxiv icon

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Add code
May 10, 2026
Viaarxiv icon

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

Add code
Apr 06, 2026
Viaarxiv icon

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Add code
Mar 16, 2026
Viaarxiv icon

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Add code
Feb 15, 2026
Viaarxiv icon

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Add code
Jan 31, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon